Key word extraction for short text via word2vec, doc2vec, and textrank
نویسندگان
چکیده
منابع مشابه
Word2Vec inversion and traditional text classifiers for phenotyping lupus
BACKGROUND Identifying patients with certain clinical criteria based on manual chart review of doctors' notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. The aim of this res...
متن کاملConvolutional Sentence Kernel from Word Embeddings for Short Text Categorization
This paper introduces a convolutional sentence kernel based on word embeddings. Our kernel overcomes the sparsity issue that arises when classifying short documents or in case of little training data. Experiments on six sentence datasets showed statistically significant higher accuracy over the standard linear kernel with ngram features and other proposed models.
متن کاملImproved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge
Keyword extraction of scientific articles is beneficial for retrieving scientific articles of a certain topic and grasping the trend of academic development. For the task of keyword extraction for Chinese scientific articles, we adopt the framework of selecting keyword candidates by Document Frequency Accessor Variety(DF-AV) and running TextRank algorithm on a phrase network. To improve domain ...
متن کاملWord Extraction and Recognition in Arabic Handwritten Text
Segmenting arabic manuscripts into text-lines and words is an important step to make recognition systems more efficient and accurate. The major problem making this task crucial is the word extraction process: first, words are often a succession of sub-words where the space value between these sub-words do not respect any rules. Second, the presence of connections even between non adjacent sub-w...
متن کاملShort-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations
Being a prevalent form of social communications on the Internet, billions of short texts are generated everyday. Discovering knowledge from them has gained a lot of interest from both industry and academia. The short texts have a limited contextual information, and they are sparse, noisy and ambiguous, and hence, automatically learning topics from them remains an important challenge. To tackle ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
سال: 2019
ISSN: 1303-6203
DOI: 10.3906/elk-1806-38